top of page

A Synthetic Data Community for Sensitive Research Data in the UK

Welcome to the UK Synthetic Data Community Group, a collaborative network of researchers, public representatives, and data owners dedicated to advancing the responsible use of synthetic data. Our mission is to develop robust governance frameworks and open-source tools that facilitate the utilisation of synthetic data in sensitive data research. Together, we strive to ensure that this innovative approach enhances research possibilities while prioritising ethical considerations and data privacy. Join in fostering a vibrant community focused on harnessing the power of synthetic data.

Abstract Sphere_edited_edited_edited.png

Advancing the Utilisation of Synthetic Data

The UK SDCG serves as a collaborative network of researchers, data owners and public representatives to advance the utlilisation of synthetic data in secure environments.

Picture8.png

Governance

Developing governance frameworks to enable the utilisation of synthetic data in secure environments.

Picture9.png

Tools

Creating open-source tools for the research data community to generate and evaluate synthetic data.

Picture10.png

Standards

Setting standards for the evaluation and reporting of synthetic data.

Picture12.png

Resources

Publishing learning resources to help researchers, data owners and the public understand more about synthetic data.

What is Synthetic Data?

Synthetic data is artificially generated data which often aims to mimic the properties of real life datasets, while protecting privacy. This is useful for a number of purposes, especially in sensitive data which often has strict governance and access controls.

 

With synthetic data, we can enable researcher training, algorithm development, federation of data environments and data discovery without the strict, lengthy approval processes typically associated with data stored in secure environments.

pics.png

Workshops for Developing Governance & Standards in Synthetic Data

Synthetic data holds transformative potential for sensitive data, enabling innovation while safeguarding patient privacy. However, realising this potential requires robust governance and standardised frameworks to ensure safe and ethical use. This group aims to host several workshops with different stakeholders to establish clear protocols for data generation, evaluation, and deployment in secure environments. This is essential to build trust and unlock synthetic data’s full value in sensitive data applications.

Researcher Workshop

Understanding user requirements and use cases for different levels of fidelity of synthetic data.

Expert Workshop

Developing tools and setting standards for the generation and evaluation of synthetic data.

Data Owner Workshop

How to enable the deployment of synthetic data which protects the original data and makes sure the process is transparent.

Public Workshop

How do we ensure synthetic data is used for the public good and protects their privacy sufficiently.

Community Tools for the Generation & Evaluation of Synthetic Data

We are developing open-source tools for the generation and evaluation of synthetic data at various levels of fidelity. The SynthOpt package is built for the Trusted Research Environment (TRE) community to generate low-fidelity synthetic data using statistical techniques, or generate higher levels of fidelity with generative AI. Plus, the package comes with several tools for evaluation, visualisation and reporting of synthetic data to ensure it can be implemented in practice.

Picture117.png

Use Generative AI to create highly realistic synthetic data which is very similar to the real data. 

Picture120.png

Create transparency reports to ensure generation processes and evaluations are transparent.

Picture118.png

Create structurally similar synthetic data from metadata instead of using the data itself.

Picture121.png

Generate varying levels of synthetic data for different purposes and use cases.

Picture119.png

Ensure privacy is protected with differentially private techniques and comprehensive evaluations.

SynthOpt Framework

A Python package to generate, evaluate and optimise synthetic data at various levels of fidelity.

Picture73.png

Generate

Generate synthetic data using either statistical or machine learning methods. This allows the generation of structural, statistical, correlated and augmented synthetic data.

Evaluate

Includes metrics for evaluating privacy, utility and quality of the synthetic data, as well as a method to create automatic reports for transparency and evaluation.

Optimise

Optional optimisation for differentially private machine learning methods to find optimial values of the privacy parameter in respect to privacy/utility weighting.

The DARE UK Synthetic Data

Community Group Co-Chairs

Picture126.png

Lewis Hotchkiss

Dementias Platform UK

Picture128.png

Simon Thompson

Dementias Platform UK,

SeRP UK, SAIL Databank

Picture127.png

Emma Squires

Dementias Platform UK

Picture130.png

John Gallacher

Dementias Platform UK,

University of Oxford

Picture129.png

Timothy Rittman

University of Cambridge

Picture131.png

Anmol Arora

University College London

Picture132.png

Emily Oliver

Administrative Data Research UK

Picture134.png

Sophie McCall

Research Data Scotland

Picture133.png

Fiona Lugg-Widger

Centre for Trials Research,
Cardiff University

Picture135.png

Robert Trubey

Centre for Trials Research, Cardiff University

Picture135.png

Cristina Magder

UK Data Service

Picture135.png

Steve Moore

Public Representative

Picture116.png
bottom of page